Flexible query formulation for federated search∗

نویسندگان

  • Matthew Michelson
  • Sofus A. Macskassy
  • Steven N. Minton
چکیده

One common framework for data integration in practice is federated search. Here an agent queries disjoint sources simultaneously, and then clusters the returned records in the absence of unique keys. However, formulating the correct queries to the sources can be challenging because of the possible query value variations. For instance, some sources may contain a first name as “John” while other sources use the name “Jonathan” for the same person. If the underlying sources do not support sophisticated matching then a single query of “John” will miss many records from the “Jonathan” sources. This paper presents an approach to formulating queries for federated search that leverages automatically discovered transformations such as synonyms and abbreviations to create the set of possible queries for the given sources. Our preliminary results demonstrate that indeed, transformations mined from a subset of sources will apply to a new, distinct source, thereby allowing query expansions based on the discovered transformations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SPARQL Query Formulation and Execution using FedViz

Health care and life sciences research heavily relies on the ability to search, discover, formulate and correlate data from distinct sources. Although the Semantic Web and Linked Data technologies help in dealing with data integration problem, there remains a barrier adopting these for non-technical research audiences. In this paper we present FedViz, a visual interface for SPARQL query formula...

متن کامل

Federated Query Formulation and Processing through BioFed

A single interface for accessing life sciences (LS) data is a natural need to master the data deluge in this domain. The data in the LS requires integration and current integrative solutions increasingly rely on the federation of queries for distributed resources. This paper demonstrates BioFed, a federated SPARQL query processing system customised for LS-LOD. BioFed enables user to formulate a...

متن کامل

FedBench: A Benchmark Suite for Federated Semantic Data Query Processing

In this paper we present FedBench, a comprehensive benchmark suite for testing and analyzing the performance of federated query processing strategies on semantic data. The major challenge lies in the heterogeneity of semantic data use cases, where applications may face different settings at both the data and query level, such as varying data access interfaces, incomplete knowledge about data so...

متن کامل

Query Transformations for Result Merging

This paper describes Carnegie Mellon University’s entry at the TREC 2014 Federated Web Search track (FedWeb14). Federated search pipelines typically have two components: (i) resource-selection, and (ii) result-merging. This work documents experiments to modify queries to merge results in the federated-search pipeline. Approaches from previous attempts at solving this problem involved custom que...

متن کامل

Federated Query Processing: Challenges and Opportunities

The increasing numbers and volumes of RDF datasets are accompanied by increasingly complex information needs. Addressing such information needs commonly requires using federated queries, which are executed over several knowledge bases to compute a result set. The aim of this invited paper is to provide an overview of current challenges and opportunities in federated query processing. To this en...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009